Analysis of GLDS-147 from NASA GeneLab

This R markdown file was auto-generated by the iDEP website Using iDEP 0.91, originally by Steven

Ge SX, Son EW, Yao R: iDEP: an integrated web application for differential expression and pathway analysis of RNA-Seq data. BMC Bioinformatics 2018, 19(1):534. PMID:30567491

1. Read data

First we set up the working directory to where the files are saved.

 setwd('~/Documents/HTML_R/GLDS147')

R packages and iDEP core Functions. Users can also download the iDEP_core_functions.R file. Many R packages needs to be installed first. This may take hours. Each of these packages took years to develop.So be a patient thief. Sometimes dependencies needs to be installed manually. If you are using an older version of R, and having trouble with package installation, try un-install the current version of R, delete all folders and files (C:/Program Files/R/R-3.4.3), and reinstall from scratch.

 if(file.exists('iDEP_core_functions.R'))
    source('iDEP_core_functions.R') else 
    source('https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/iDEP_core_functions.R') 

We are using the downloaded gene expression file where gene IDs has been converted to Ensembl gene IDs. This is because the ID conversion database is too large to download. You can use your original file if your file uses Ensembl ID, or you do not want to use the pathway files available in iDEP (or it is not available).

 inputFile <- 'GLDS147_Expression.csv'
 sampleInfoFile <- 'GLDS147_Sampleinfo.csv'
 gldsMetadataFile <- 'GLDS147_Metadata.csv'
 geneInfoFile <- 'Arabidopsis_thaliana__athaliana_eg_gene_GeneInfo.csv' #Gene symbols, location etc. 
 geneSetFile <- 'Arabidopsis_thaliana__athaliana_eg_gene.db'  # pathway database in SQL; can be GMT format 
 STRING10_speciesFile <- 'https://raw.githubusercontent.com/iDEP-SDSU/idep/master/shinyapps/idep/STRING10_species.csv' 

Parameters for reading data

 input_missingValue <- 'geneMedian' #Missing values imputation method
 input_dataFileFormat <- 1  #1- read counts, 2 FKPM/RPKM or DNA microarray
 input_minCounts <- 0.5 #Min counts
 input_NminSamples <- 1 #Minimum number of samples 
 input_countsLogStart <- 4  #Pseudo count for log CPM
 input_CountsTransform <- 1 #Methods for data transformation of counts. 1-EdgeR's logCPM 2-VST, 3-rlog 
readMetadata.out <- readMetadata(gldsMetadataFile)
library(knitr)   #  install if needed. for showing tables with kable
library(kableExtra)
kable( readMetadata.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%")
ARG1KO_FLT_Rep1 ARG1KO_FLT_Rep2 ARG1KO_FLT_Rep3 ARG1KO_FLT_Rep4 ARG1KO_GC_Rep1 ARG1KO_GC_Rep2 ARG1KO_GC_Rep3 ARG1KO_GC_Rep4 WT_FLT_Rep1 WT_FLT_Rep2 WT_FLT_Rep3 WT_FLT_Rep4 WT_FLT_Rep5 WT_GC_Rep1 WT_GC_Rep2 WT_GC_Rep3 WT_GC_Rep4 WT_GC_Rep5
Sample.LongId Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep1.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep2.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep3.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep4.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep1.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep2.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep3.147.Array Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep4.147.Array Atha.Col.0.HypocotylCC.WT.FLT.Rep1.147.Array Atha.Col.0.HypocotylCC.WT.FLT.Rep2.147.Array Atha.Col.0.HypocotylCC.WT.FLT.Rep3.147.Array Atha.Col.0.HypocotylCC.WT.FLT.Rep4.147.Array Atha.Col.0.HypocotylCC.WT.FLT.Rep5.147.Array Atha.Col.0.HypocotylCC.WT.GC.Rep1.147.Array Atha.Col.0.HypocotylCC.WT.GC.Rep2.147.Array Atha.Col.0.HypocotylCC.WT.GC.Rep3.147.Array Atha.Col.0.HypocotylCC.WT.GC.Rep4.147.Array Atha.Col.0.HypocotylCC.WT.GC.Rep5.147.Array
Sample.Id Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep1 Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep2 Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep3 Atha.Col.0.HypocotylCC.ARG1.KO.FLT.Rep4 Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep1 Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep2 Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep3 Atha.Col.0.HypocotylCC.ARG1.KO.GC.Rep4 Atha.Col.0.HypocotylCC.WT.FLT.Rep1 Atha.Col.0.HypocotylCC.WT.FLT.Rep2 Atha.Col.0.HypocotylCC.WT.FLT.Rep3 Atha.Col.0.HypocotylCC.WT.FLT.Rep4 Atha.Col.0.HypocotylCC.WT.FLT.Rep5 Atha.Col.0.HypocotylCC.WT.GC.Rep1 Atha.Col.0.HypocotylCC.WT.GC.Rep2 Atha.Col.0.HypocotylCC.WT.GC.Rep3 Atha.Col.0.HypocotylCC.WT.GC.Rep4 Atha.Col.0.HypocotylCC.WT.GC.Rep5
Sample.Name Atha_Col-0_HypocotylCC_ARG1-KO_FLT_Rep1 Atha_Col-0_HypocotylCC_ARG1-KO_FLT_Rep2 Atha_Col-0_HypocotylCC_ARG1-KO_FLT_Rep3 Atha_Col-0_HypocotylCC_ARG1-KO_FLT_Rep4 Atha_Col-0_HypocotylCC_ARG1-KO_GC_Rep1 Atha_Col-0_HypocotylCC_ARG1-KO_GC_Rep2 Atha_Col-0_HypocotylCC_ARG1-KO_GC_Rep3 Atha_Col-0_HypocotylCC_ARG1-KO_GC_Rep4 Atha_Col-0_HypocotylCC_WT_FLT_Rep1 Atha_Col-0_HypocotylCC_WT_FLT_Rep2 Atha_Col-0_HypocotylCC_WT_FLT_Rep3 Atha_Col-0_HypocotylCC_WT_FLT_Rep4 Atha_Col-0_HypocotylCC_WT_FLT_Rep5 Atha_Col-0_HypocotylCC_WT_GC_Rep1 Atha_Col-0_HypocotylCC_WT_GC_Rep2 Atha_Col-0_HypocotylCC_WT_GC_Rep3 Atha_Col-0_HypocotylCC_WT_GC_Rep4 Atha_Col-0_HypocotylCC_WT_GC_Rep5
GLDS 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147 147
Accession GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147 GLDS-147
Hardware BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC BRIC
Tissue Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures Cell cultures
Age 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days 12 days
Organism Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana Arabidopsis thaliana
Ecotype Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0 Col-0
Genotype arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1 arg1
Variety Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1 Col-0 arg1
Radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Cosmic radiation Background Earth Background Earth Background Earth Background Earth Background Earth
Gravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Microgravity Terrestrial Terrestrial Terrestrial Terrestrial Terrestrial
Developmental 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture 12 day old cell culture
Time.series.or.Concentration.gradient Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point Single time point
Light Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark Dark
Assay..RNAseq. Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling Microarray Transcription Profiling
Temperature Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS Ambient ISS
Treatment.type Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight Arg1 functions in the physiological adaptation of undifferentiated plant cells to spaceflight
Treatment.intensity x x x x x x x x x x x x x x x x x x
Treament.timing x x x x x x x x x x x x x x x x x x
Preservation.Method. RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater RNAlater
 readData.out <- readData(inputFile) 
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
   kable( head(readData.out$data) ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
ARG1KO_FLT_Rep1 ARG1KO_FLT_Rep2 ARG1KO_FLT_Rep3 ARG1KO_FLT_Rep4 ARG1KO_GC_Rep1 ARG1KO_GC_Rep2 ARG1KO_GC_Rep3 ARG1KO_GC_Rep4 WT_FLT_Rep1 WT_FLT_Rep2 WT_FLT_Rep3 WT_FLT_Rep4 WT_FLT_Rep5 WT_GC_Rep1 WT_GC_Rep2 WT_GC_Rep3 WT_GC_Rep4 WT_GC_Rep5
AT1G30700 3.459432 3.459432 3.169925 3.584963 3.169925 3.321928 3.169925 3.000000 2.807355 3.000000 3.000000 3.321928 2.807355 3.000000 3.000000 2.807355 2.807355 3.169925
AT5G07870 2.807355 2.807355 3.000000 3.000000 3.321928 3.321928 3.321928 3.321928 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355
AT3G25250 3.169925 3.000000 3.321928 3.321928 3.169925 3.169925 3.584963 3.321928 2.807355 2.807355 3.000000 3.000000 3.000000 3.000000 2.807355 3.000000 2.807355 3.000000
AT5G22570 3.000000 3.000000 3.169925 3.321928 3.321928 3.321928 3.321928 3.000000 3.000000 3.000000 2.807355 2.807355 2.584963 2.807355 2.807355 2.807355 2.807355 3.000000
AT1G28480 3.169925 3.169925 3.459432 3.459432 3.321928 3.459432 3.459432 3.459432 3.000000 3.000000 3.000000 3.169925 3.169925 3.169925 3.000000 3.000000 3.000000 3.169925
AT5G07860 2.584963 2.807355 3.000000 3.000000 3.321928 3.321928 3.321928 3.169925 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355 2.807355
 readSampleInfo.out <- readSampleInfo(sampleInfoFile) 
 kable( readSampleInfo.out ) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Gravity Variety
ARG1KO_FLT_Rep1 Microgravity Col0 arg1
ARG1KO_FLT_Rep2 Microgravity Col0 arg1
ARG1KO_FLT_Rep3 Microgravity Col0 arg1
ARG1KO_FLT_Rep4 Microgravity Col0 arg1
ARG1KO_GC_Rep1 Terrestrial Col0 arg1
ARG1KO_GC_Rep2 Terrestrial Col0 arg1
ARG1KO_GC_Rep3 Terrestrial Col0 arg1
ARG1KO_GC_Rep4 Terrestrial Col0 arg1
WT_FLT_Rep1 Microgravity Col0 WT
WT_FLT_Rep2 Microgravity Col0 WT
WT_FLT_Rep3 Microgravity Col0 WT
WT_FLT_Rep4 Microgravity Col0 WT
WT_FLT_Rep5 Microgravity Col0 WT
WT_GC_Rep1 Terrestrial Col0 WT
WT_GC_Rep2 Terrestrial Col0 WT
WT_GC_Rep3 Terrestrial Col0 WT
WT_GC_Rep4 Terrestrial Col0 WT
WT_GC_Rep5 Terrestrial Col0 WT
 input_selectOrg ="NEW" 
 input_selectGO <- 'GOBP'   #Gene set category 
 input_noIDConversion = TRUE  
 allGeneInfo.out <- geneInfo(geneInfoFile) 
 converted.out = NULL 
 convertedData.out <- convertedData()    
 nGenesFilter()  
## [1] "16156 genes in 18 samples. 16156  genes passed filter.\n Original gene IDs used."
 convertedCounts.out <- convertedCounts()  # converted counts, just for compatibility 

2. Pre-process

# Read counts per library 
 parDefault = par() 
 par(mar=c(12,4,2,2)) 
 # barplot of total read counts
 x <- readData.out$rawCounts
 groups = as.factor( detectGroups(colnames(x ) ) )
 if(nlevels(groups)<=1 | nlevels(groups) >20 )  
  col1 = 'green'  else
  col1 = rainbow(nlevels(groups))[ groups ]             
         
 barplot( colSums(x)/1e6, 
        col=col1,las=3, main="Total read counts (millions)")  

 readCountsBias()  # detecting bias in sequencing depth 
## [1] 0.5231528
## [1] 0.92608
## [1] 0.1243607
## [1] "No bias detected"
 # Box plot 
 x = readData.out$data 
 boxplot(x, las = 2, col=col1,
    ylab='Transformed expression levels',
    main='Distribution of transformed data') 

 #Density plot 
 par(parDefault) 
## Warning in par(parDefault): graphical parameter "cin" cannot be set
## Warning in par(parDefault): graphical parameter "cra" cannot be set
## Warning in par(parDefault): graphical parameter "csi" cannot be set
## Warning in par(parDefault): graphical parameter "cxy" cannot be set
## Warning in par(parDefault): graphical parameter "din" cannot be set
## Warning in par(parDefault): graphical parameter "page" cannot be set
 densityPlot()       

 # Scatter plot of the first two samples 
 plot(x[,1:2],xlab=colnames(x)[1],ylab=colnames(x)[2], 
    main='Scatter plot of first two samples') 

 ####plot gene or gene family
 input_selectOrg ="BestMatch" 
 input_geneSearch <- 'HOXA' #Gene ID for searching 
 genePlot()  
## NULL
 input_useSD <- 'FALSE' #Use standard deviation instead of standard error in error bar? 
 geneBarPlotError()       
## NULL

3. Heatmap

 # hierarchical clustering tree
 x <- readData.out$data
 maxGene <- apply(x,1,max)
 # remove bottom 25% lowly expressed genes, which inflate the PPC
 x <- x[which(maxGene > quantile(maxGene)[1] ) ,] 
 plot(as.dendrogram(hclust2( dist2(t(x)))), ylab="1 - Pearson C.C.", type = "rectangle") 

 #Correlation matrix
 input_labelPCC <- TRUE #Show correlation coefficient? 
 correlationMatrix() 

 # Parameters for heatmap
 input_nGenes <- 1000   #Top genes for heatmap
 input_geneCentering <- TRUE    #centering genes ?
 input_sampleCentering <- FALSE #Center by sample?
 input_geneNormalize <- FALSE   #Normalize by gene?
 input_sampleNormalize <- FALSE #Normalize by sample?
 input_noSampleClustering <- FALSE  #Use original sample order
 input_heatmapCutoff <- 4   #Remove outliers beyond number of SDs 
 input_distFunctions <- 1   #which distant funciton to use
 input_hclustFunctions <- 1 #Linkage type
 input_heatColors1 <- 1 #Colors
 input_selectFactorsHeatmap <- 'Gravity'    #Sample coloring factors 
 png('heatmap.png', width = 10, height = 15, units = 'in', res = 300) 
 staticHeatmap() 
 dev.off()  
## png 
##   2

[heatmap] (heatmap.png)

 heatmapPlotly() # interactive heatmap using Plotly 

4. K-means clustering

 input_nGenesKNN <- 2000    #Number of genes fro k-Means
 input_nClusters <- 4   #Number of clusters 
 maxGeneClustering = 12000
 input_kmeansNormalization <- 'geneMean'    #Normalization
 input_KmeansReRun <- 0 #Random seed 

 distributionSD()  #Distribution of standard deviations 

 KmeansNclusters()  #Number of clusters 

 Kmeans.out = Kmeans()   #Running K-means 
 KmeansHeatmap()   #Heatmap for k-Means 

 #Read gene sets for enrichment analysis 
 sqlite  <- dbDriver('SQLite')
 input_selectGO3 <- 'GOBP'  #Gene set category
 input_minSetSize <- 15 #Min gene set size
 input_maxSetSize <- 2000   #Max gene set size 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO3,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 # Alternatively, users can use their own GMT files by
 #GeneSets.out <- readGMTRobust('somefile.GMT')  
 results <- KmeansGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 9.09e-26 99 Response to abiotic stimulus
1.60e-24 93 Response to organic substance
2.88e-23 83 Response to endogenous stimulus
1.51e-22 81 Response to hormone
5.10e-22 72 Response to external stimulus
8.58e-22 76 Cellular response to chemical stimulus
8.97e-20 62 Defense response
8.97e-20 57 Response to external biotic stimulus
8.97e-20 57 Response to other organism
1.57e-19 57 Response to biotic stimulus
B 6.64e-35 150 Response to abiotic stimulus
6.96e-34 154 Regulation of gene expression
1.51e-28 146 Nucleobase-containing compound biosynthetic process
3.18e-28 145 Multicellular organism development
8.47e-28 131 Response to organic substance
1.76e-27 137 Regulation of biosynthetic process
6.95e-27 132 Regulation of macromolecule biosynthetic process
1.20e-26 128 Regulation of RNA metabolic process
3.09e-26 130 Regulation of cellular macromolecule biosynthetic process
5.79e-26 133 Regulation of cellular biosynthetic process
C 2.92e-15 60 Oxidation-reduction process
5.57e-12 67 Response to abiotic stimulus
3.63e-11 29 Generation of precursor metabolites and energy
3.98e-08 41 Response to acid chemical
3.98e-08 49 Response to oxygen-containing compound
3.99e-08 19 Photosynthesis
3.99e-08 14 Cellular respiration
4.64e-08 17 Electron transport chain
3.76e-07 14 Energy derivation by oxidation of organic compounds
8.37e-07 44 Cellular response to chemical stimulus
D 6.32e-23 84 Response to abiotic stimulus
2.29e-15 61 Response to oxygen-containing compound
1.84e-12 63 Response to organic substance
1.84e-12 42 Response to inorganic substance
1.84e-12 54 Cellular response to chemical stimulus
1.03e-11 46 Response to acid chemical
8.61e-11 54 Response to endogenous stimulus
1.37e-10 53 Response to hormone
6.47e-10 31 Response to osmotic stress
8.14e-10 34 Response to lipid
 input_seedTSNE <- 0    #Random seed for t-SNE
 input_colorGenes <- TRUE   #Color genes in t-SNE plot? 
 tSNEgenePlot()  #Plot genes using t-SNE 

5. PCA and beyond

 input_selectFactors <- 'Gravity'   #Factor coded by color
 input_selectFactors2 <- 'Variety'  #Factor coded by shape
 input_tsneSeed2 <- 0   #Random seed for t-SNE 
 #PCA, MDS and t-SNE plots
 PCAplot()  

 MDSplot() 

 tSNEplot()  

 #Read gene sets for pathway analysis using PGSEA on principal components 
 input_selectGO6 <- 'GOBP' 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO6,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 PCApathway() # Run PGSEA analysis 
## Warning: Package 'KEGG.db' is deprecated and will be removed from Bioconductor
##   version 3.12

 cat( PCA2factor() )   #The correlation between PCs with factors 
## 
##  Correlation between Principal Components (PCs) with factors
## PC1 is correlated with Variety (p=1.93e-02).

6. DEG1

 input_CountsDEGMethod <- 3 #DESeq2= 3,limma-voom=2,limma-trend=1 
 input_limmaPval <- 0.1 #FDR cutoff
 input_limmaFC <- 2 #Fold-change cutoff
 input_selectModelComprions <- 'Gravity: Microgravity vs. Terrestrial'  #Selected comparisons
 input_selectFactorsModel <- 'Gravity'  #Selected comparisons
 input_selectInteractions <- NULL   #Selected comparisons
 input_selectBlockFactorsModel <- NULL  #Selected comparisons
 factorReferenceLevels.out <- c('Gravity:Terrestrial') 

 limma.out <- limma()
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
## Error in estimateDispersionsFit(object, fitType = fitType, quiet = quiet): all gene-wise dispersion estimates are within 2 orders of magnitude
##   from the minimum value, and so the standard curve fitting techniques will not work.
##   One can instead use the gene-wise estimates as final estimates:
##   dds <- estimateDispersionsGeneEst(dds)
##   dispersions(dds) <- mcols(dds)$dispGeneEst
##   ...then continue with testing using nbinomWaldTest or nbinomLRT
 DEG.data.out <- DEG.data()
## Error in DEG.data(): object 'limma.out' not found
 limma.out$comparisons 
## Error in eval(expr, envir, enclos): object 'limma.out' not found
 input_selectComparisonsVenn = limma.out$comparisons[1:3] # use first three comparisons
## Error in eval(expr, envir, enclos): object 'limma.out' not found
 input_UpDownRegulated <- FALSE #Split up and down regulated genes 
 vennPlot() # Venn diagram 
## Error in vennPlot(): object 'limma.out' not found
  sigGeneStats() # number of DEGs as figure 
## Error in sigGeneStats(): object 'limma.out' not found
  sigGeneStatsTable() # number of DEGs as table 
## Error in sigGeneStatsTable(): object 'limma.out' not found

7. DEG2

 input_selectContrast = limma.out$comparisons[1] # use first  comparisons 
## Error in eval(expr, envir, enclos): object 'limma.out' not found
 selectedHeatmap.data.out <- selectedHeatmap.data()
## Error in selectedHeatmap.data(): object 'limma.out' not found
 selectedHeatmap()   # heatmap for DEGs in selected comparison
## Error in selectedHeatmap(): object 'selectedHeatmap.data.out' not found
 # Save gene lists and data into files
 write.csv( selectedHeatmap.data()$genes, 'heatmap.data.csv') 
## Error in selectedHeatmap.data(): object 'limma.out' not found
 write.csv(DEG.data(),'DEG.data.csv' )
## Error in DEG.data(): object 'limma.out' not found
 write(AllGeneListsGMT() ,'AllGeneListsGMT.gmt')
## Error in AllGeneListsGMT(): object 'limma.out' not found
 input_selectGO2 <- 'GOBP'  #Gene set category 
 geneListData.out <- geneListData()  
## Error in geneListData(): object 'input_selectContrast' not found
 volcanoPlot()  
## Error in volcanoPlot(): object 'limma.out' not found
  scatterPlot()  
## Error in scatterPlot(): object 'limma.out' not found
  MAplot()  
## Error in MAplot(): object 'limma.out' not found
  geneListGOTable.out <- geneListGOTable()  
## Error in geneListGOTable(): object 'selectedHeatmap.data.out' not found
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO2,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_removeRedudantSets <- TRUE   #Remove highly redundant gene sets? 
 results <- geneListGO()  #Enrichment analysis
## Error in geneListGO(): object 'geneListGOTable.out' not found
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 9.09e-26 99 Response to abiotic stimulus
1.60e-24 93 Response to organic substance
2.88e-23 83 Response to endogenous stimulus
1.51e-22 81 Response to hormone
5.10e-22 72 Response to external stimulus
8.58e-22 76 Cellular response to chemical stimulus
8.97e-20 62 Defense response
8.97e-20 57 Response to external biotic stimulus
8.97e-20 57 Response to other organism
1.57e-19 57 Response to biotic stimulus
B 6.64e-35 150 Response to abiotic stimulus
6.96e-34 154 Regulation of gene expression
1.51e-28 146 Nucleobase-containing compound biosynthetic process
3.18e-28 145 Multicellular organism development
8.47e-28 131 Response to organic substance
1.76e-27 137 Regulation of biosynthetic process
6.95e-27 132 Regulation of macromolecule biosynthetic process
1.20e-26 128 Regulation of RNA metabolic process
3.09e-26 130 Regulation of cellular macromolecule biosynthetic process
5.79e-26 133 Regulation of cellular biosynthetic process
C 2.92e-15 60 Oxidation-reduction process
5.57e-12 67 Response to abiotic stimulus
3.63e-11 29 Generation of precursor metabolites and energy
3.98e-08 41 Response to acid chemical
3.98e-08 49 Response to oxygen-containing compound
3.99e-08 19 Photosynthesis
3.99e-08 14 Cellular respiration
4.64e-08 17 Electron transport chain
3.76e-07 14 Energy derivation by oxidation of organic compounds
8.37e-07 44 Cellular response to chemical stimulus
D 6.32e-23 84 Response to abiotic stimulus
2.29e-15 61 Response to oxygen-containing compound
1.84e-12 63 Response to organic substance
1.84e-12 42 Response to inorganic substance
1.84e-12 54 Cellular response to chemical stimulus
1.03e-11 46 Response to acid chemical
8.61e-11 54 Response to endogenous stimulus
1.37e-10 53 Response to hormone
6.47e-10 31 Response to osmotic stress
8.14e-10 34 Response to lipid

STRING-db API access. We need to find the taxonomy id of your species, this used by STRING. First we try to guess the ID based on iDEP’s database. Users can also skip this step and assign NCBI taxonomy id directly by findTaxonomyID.out = 10090 # mouse 10090, human 9606 etc.

 STRING10_species = read.csv(STRING10_speciesFile)  
 ix = grep('Arabidopsis thaliana', STRING10_species$official_name ) 
 findTaxonomyID.out <- STRING10_species[ix,1] # find taxonomyID
 findTaxonomyID.out  
## [1] 3702

Enrichment analysis using STRING

  STRINGdb_geneList.out <- STRINGdb_geneList() #convert gene lists
## Error in STRINGdb_geneList(): object 'geneListData.out' not found
 input_STRINGdbGO <- 'Process'  #'Process', 'Component', 'Function', 'KEGG', 'Pfam', 'InterPro' 
 results <- stringDB_GO_enrichmentData()  # enrichment using STRING 
## Error in stringDB_GO_enrichmentData(): object 'selectedHeatmap.data.out' not found
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 9.09e-26 99 Response to abiotic stimulus
1.60e-24 93 Response to organic substance
2.88e-23 83 Response to endogenous stimulus
1.51e-22 81 Response to hormone
5.10e-22 72 Response to external stimulus
8.58e-22 76 Cellular response to chemical stimulus
8.97e-20 62 Defense response
8.97e-20 57 Response to external biotic stimulus
8.97e-20 57 Response to other organism
1.57e-19 57 Response to biotic stimulus
B 6.64e-35 150 Response to abiotic stimulus
6.96e-34 154 Regulation of gene expression
1.51e-28 146 Nucleobase-containing compound biosynthetic process
3.18e-28 145 Multicellular organism development
8.47e-28 131 Response to organic substance
1.76e-27 137 Regulation of biosynthetic process
6.95e-27 132 Regulation of macromolecule biosynthetic process
1.20e-26 128 Regulation of RNA metabolic process
3.09e-26 130 Regulation of cellular macromolecule biosynthetic process
5.79e-26 133 Regulation of cellular biosynthetic process
C 2.92e-15 60 Oxidation-reduction process
5.57e-12 67 Response to abiotic stimulus
3.63e-11 29 Generation of precursor metabolites and energy
3.98e-08 41 Response to acid chemical
3.98e-08 49 Response to oxygen-containing compound
3.99e-08 19 Photosynthesis
3.99e-08 14 Cellular respiration
4.64e-08 17 Electron transport chain
3.76e-07 14 Energy derivation by oxidation of organic compounds
8.37e-07 44 Cellular response to chemical stimulus
D 6.32e-23 84 Response to abiotic stimulus
2.29e-15 61 Response to oxygen-containing compound
1.84e-12 63 Response to organic substance
1.84e-12 42 Response to inorganic substance
1.84e-12 54 Cellular response to chemical stimulus
1.03e-11 46 Response to acid chemical
8.61e-11 54 Response to endogenous stimulus
1.37e-10 53 Response to hormone
6.47e-10 31 Response to osmotic stress
8.14e-10 34 Response to lipid

PPI network retrieval and analysis

 input_nGenesPPI <- 100 #Number of top genes for PPI retrieval and analysis 
 stringDB_network1(1) #Show PPI network 
## Error in stringDB_network1(1): object 'STRINGdb_geneList.out' not found

Generating interactive PPI

 write(stringDB_network_link(), 'PPI_results.html') # write results to html file 
## Error in stringDB_network_link(): object 'STRINGdb_geneList.out' not found
 browseURL('PPI_results.html') # open in browser 

8. Pathway analysis

 input_selectContrast1 = limma.out$comparisons[1] 
## Error in eval(expr, envir, enclos): object 'limma.out' not found
 #input_selectContrast1 = limma.out$comparisons[3] # manually set
 input_selectGO <- 'GOBP'   #Gene set category 
 #input_selectGO='custom' # if custom gmt file
 input_minSetSize <- 15 #Min size for gene set
 input_maxSetSize <- 2000   #Max size for gene set 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_pathwayPvalCutoff <- 0.2 #FDR cutoff
 input_nPathwayShow <- 30   #Top pathways to show
 input_absoluteFold <- FALSE    #Use absolute values of fold-change?
 input_GenePvalCutoff <- 1  #FDR to remove genes 

 input_pathwayMethod = 1  # 1  GAGE
 gagePathwayData.out <- gagePathwayData()  # pathway analysis using GAGE  
## Error in gagePathwayData(): object 'limma.out' not found
 results <- gagePathwayData.out  #Enrichment analysis for k-Means clusters  
## Error in eval(expr, envir, enclos): object 'gagePathwayData.out' not found
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 9.09e-26 99 Response to abiotic stimulus
1.60e-24 93 Response to organic substance
2.88e-23 83 Response to endogenous stimulus
1.51e-22 81 Response to hormone
5.10e-22 72 Response to external stimulus
8.58e-22 76 Cellular response to chemical stimulus
8.97e-20 62 Defense response
8.97e-20 57 Response to external biotic stimulus
8.97e-20 57 Response to other organism
1.57e-19 57 Response to biotic stimulus
B 6.64e-35 150 Response to abiotic stimulus
6.96e-34 154 Regulation of gene expression
1.51e-28 146 Nucleobase-containing compound biosynthetic process
3.18e-28 145 Multicellular organism development
8.47e-28 131 Response to organic substance
1.76e-27 137 Regulation of biosynthetic process
6.95e-27 132 Regulation of macromolecule biosynthetic process
1.20e-26 128 Regulation of RNA metabolic process
3.09e-26 130 Regulation of cellular macromolecule biosynthetic process
5.79e-26 133 Regulation of cellular biosynthetic process
C 2.92e-15 60 Oxidation-reduction process
5.57e-12 67 Response to abiotic stimulus
3.63e-11 29 Generation of precursor metabolites and energy
3.98e-08 41 Response to acid chemical
3.98e-08 49 Response to oxygen-containing compound
3.99e-08 19 Photosynthesis
3.99e-08 14 Cellular respiration
4.64e-08 17 Electron transport chain
3.76e-07 14 Energy derivation by oxidation of organic compounds
8.37e-07 44 Cellular response to chemical stimulus
D 6.32e-23 84 Response to abiotic stimulus
2.29e-15 61 Response to oxygen-containing compound
1.84e-12 63 Response to organic substance
1.84e-12 42 Response to inorganic substance
1.84e-12 54 Cellular response to chemical stimulus
1.03e-11 46 Response to acid chemical
8.61e-11 54 Response to endogenous stimulus
1.37e-10 53 Response to hormone
6.47e-10 31 Response to osmotic stress
8.14e-10 34 Response to lipid
 pathwayListData.out = pathwayListData() 
## Error in pathwayListData(): object 'gagePathwayData.out' not found
 enrichmentPlot(pathwayListData.out, 25  ) 
## Error in enrichmentPlot(pathwayListData.out, 25): object 'pathwayListData.out' not found
  enrichmentNetwork(pathwayListData.out )  
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'pathwayListData.out' not found
  enrichmentNetworkPlotly(pathwayListData.out) 
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'pathwayListData.out' not found
 input_pathwayMethod = 3  # 1  fgsea 
 fgseaPathwayData.out <- fgseaPathwayData() #Pathway analysis using fgsea 
## Error in fgseaPathwayData(): object 'limma.out' not found
 results <- fgseaPathwayData.out  #Enrichment analysis for k-Means clusters 
## Error in eval(expr, envir, enclos): object 'fgseaPathwayData.out' not found
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
Cluster adj.Pval Genes Pathways
A 9.09e-26 99 Response to abiotic stimulus
1.60e-24 93 Response to organic substance
2.88e-23 83 Response to endogenous stimulus
1.51e-22 81 Response to hormone
5.10e-22 72 Response to external stimulus
8.58e-22 76 Cellular response to chemical stimulus
8.97e-20 62 Defense response
8.97e-20 57 Response to external biotic stimulus
8.97e-20 57 Response to other organism
1.57e-19 57 Response to biotic stimulus
B 6.64e-35 150 Response to abiotic stimulus
6.96e-34 154 Regulation of gene expression
1.51e-28 146 Nucleobase-containing compound biosynthetic process
3.18e-28 145 Multicellular organism development
8.47e-28 131 Response to organic substance
1.76e-27 137 Regulation of biosynthetic process
6.95e-27 132 Regulation of macromolecule biosynthetic process
1.20e-26 128 Regulation of RNA metabolic process
3.09e-26 130 Regulation of cellular macromolecule biosynthetic process
5.79e-26 133 Regulation of cellular biosynthetic process
C 2.92e-15 60 Oxidation-reduction process
5.57e-12 67 Response to abiotic stimulus
3.63e-11 29 Generation of precursor metabolites and energy
3.98e-08 41 Response to acid chemical
3.98e-08 49 Response to oxygen-containing compound
3.99e-08 19 Photosynthesis
3.99e-08 14 Cellular respiration
4.64e-08 17 Electron transport chain
3.76e-07 14 Energy derivation by oxidation of organic compounds
8.37e-07 44 Cellular response to chemical stimulus
D 6.32e-23 84 Response to abiotic stimulus
2.29e-15 61 Response to oxygen-containing compound
1.84e-12 63 Response to organic substance
1.84e-12 42 Response to inorganic substance
1.84e-12 54 Cellular response to chemical stimulus
1.03e-11 46 Response to acid chemical
8.61e-11 54 Response to endogenous stimulus
1.37e-10 53 Response to hormone
6.47e-10 31 Response to osmotic stress
8.14e-10 34 Response to lipid
  pathwayListData.out = pathwayListData() 
## Error in pathwayListData(): object 'fgseaPathwayData.out' not found
 enrichmentPlot(pathwayListData.out, 25  ) 
## Error in enrichmentPlot(pathwayListData.out, 25): object 'pathwayListData.out' not found
  enrichmentNetwork(pathwayListData.out )  
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'pathwayListData.out' not found
  enrichmentNetworkPlotly(pathwayListData.out) 
## Error in h(simpleError(msg, call)): error in evaluating the argument 'X' in selecting a method for function 'lapply': object 'pathwayListData.out' not found
   PGSEAplot() # pathway analysis using PGSEA 
## Error in PGSEAplot(): object 'input_selectContrast1' not found

9. Chromosome

 input_selectContrast2 = limma.out$comparisons[1] 
## Error in eval(expr, envir, enclos): object 'limma.out' not found
 #input_selectContrast2 = limma.out$comparisons[3] # manually set
 input_limmaPvalViz <- 0.1  #FDR to filter genes
 input_limmaFCViz <- 2  #FDR to filter genes 
 genomePlotly() # shows fold-changes on the genome 
## Error in genomePlotly(): object 'limma.out' not found

10. Biclustering

 input_nGenesBiclust <- 1000    #Top genes for biclustering
 input_biclustMethod <- 'BCCC()'    #Method: 'BCCC', 'QUBIC', 'runibic' ... 
 biclustering.out = biclustering()  # run analysis

 input_selectBicluster <- 1 #select a cluster 
 biclustHeatmap()   # heatmap for selected cluster 

 input_selectGO4 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO4,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  )  
 results <- geneListBclustGO()  #Enrichment analysis for k-Means clusters   
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
1.4e-64 222 Response to abiotic stimulus
9.3e-44 182 Response to organic substance
9.3e-44 161 Response to oxygen-containing compound
1.0e-40 159 Response to hormone
1.8e-40 160 Response to endogenous stimulus
1.1e-38 139 Response to external stimulus
3.9e-36 126 Response to acid chemical
1.4e-32 138 Cellular response to chemical stimulus
1.4e-31 126 Multi-organism process
3.9e-30 50 Cellular response to decreased oxygen levels

11. Co-expression network

 input_mySoftPower <- 5 #SoftPower to cutoff
 input_nGenesNetwork <- 1000    #Number of top genes
 input_minModuleSize <- 20  #Module size minimum 
 wgcna.out = wgcna()   # run WGCNA  
## Warning: executing %dopar% sequentially: no parallel backend registered
##    Power SFT.R.sq  slope truncated.R.sq mean.k. median.k. max.k.
## 1      1   0.2890  1.630          0.928 312.000   311.000 451.00
## 2      2   0.0627 -0.305          0.901 136.000   132.000 251.00
## 3      3   0.2860 -0.597          0.934  69.100    64.400 153.00
## 4      4   0.5440 -0.862          0.968  38.700    34.500  99.20
## 5      5   0.6740 -1.030          0.962  23.100    19.800  67.80
## 6      6   0.7430 -1.130          0.990  14.500    11.900  47.90
## 7      7   0.7820 -1.260          0.989   9.510     7.460  34.80
## 8      8   0.7950 -1.350          0.971   6.430     4.810  25.90
## 9      9   0.8270 -1.390          0.987   4.480     3.160  19.60
## 10    10   0.8390 -1.450          0.991   3.190     2.140  15.20
## 11    12   0.8580 -1.520          0.974   1.730     1.080   9.42
## 12    14   0.8330 -1.650          0.950   1.010     0.585   6.54
## 13    16   0.8880 -1.680          0.986   0.628     0.344   4.77
## 14    18   0.8870 -1.700          0.941   0.412     0.199   3.60
## 15    20   0.3070 -2.920          0.255   0.283     0.118   3.09
## TOM calculation: adjacency..
## ..will not use multithreading.
##  Fraction of slow calculations: 0.000000
## ..connectivity..
## ..matrix multiplication (system BLAS)..
## ..normalization..
## ..done.
 softPower()  # soft power curve 

  modulePlot()  # plot modules  

  listWGCNA.Modules.out = listWGCNA.Modules() #modules
 input_selectGO5 <- 'GOBP'  #Gene set category 
 # Read pathway data again 
 GeneSets.out <-readGeneSets( geneSetFile,
    convertedData.out, input_selectGO5,input_selectOrg,
    c(input_minSetSize, input_maxSetSize)  ) 
 input_selectWGCNA.Module <- 'Entire network'   #Select a module
 input_topGenesNetwork <- 10    #SoftPower to cutoff
 input_edgeThreshold <- 0.4 #Number of top genes 
 moduleNetwork()    # show network of top genes in selected module
##  softConnectivity: FYI: connecitivty of genes with less than 6 valid samples will be returned as NA.
##  ..calculating connectivities..

 input_removeRedudantSets <- TRUE   #Remove redundant gene sets 
 results <- networkModuleGO()  #Enrichment analysis of selected module
 results$adj.Pval <- format( results$adj.Pval,digits=3 )
 kable( results, row.names=FALSE) %>%
  kable_styling(bootstrap_options = c("striped", "hover")) %>%
  scroll_box(width = "100%") 
adj.Pval Genes Pathways
1.4e-64 222 Response to abiotic stimulus
9.3e-44 182 Response to organic substance
9.3e-44 161 Response to oxygen-containing compound
1.0e-40 159 Response to hormone
1.8e-40 160 Response to endogenous stimulus
1.1e-38 139 Response to external stimulus
3.9e-36 126 Response to acid chemical
1.4e-32 138 Cellular response to chemical stimulus
1.4e-31 126 Multi-organism process
3.9e-30 50 Cellular response to decreased oxygen levels